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Consider informative selection of a sample from a finite population. Responses are realized 
as independent and identically distributed (i.i.d.) random variables with a probability density 
function (p.d.f.) /, referred to as the superpopulation model. The selection is informative in 
the sense that the sample responses, given that they were selected, are not i.i.d. /. In general, 
the informative selection mechanism may induce dependence among the selected observations. 
The impact of such dependence on the empirical cumulative distribution function (c.d.f.) is 
studied. An asymptotic framework and weak conditions on the informative selection mechanism 
are developed under which the (unweighted) empirical c.d.f. converges uniformly, in L2 and 
almost surely, to a weighted version of the superpopulation c.d.f. This yields an analogue of the 
Glivenko-Cantelli theorem. A series of examples, motivated by real problems in surveys and 
other observational studies, shows that the conditions are verifiable for specified designs. 

Keywords: complex survey; cut-off sampling; endogenous stratification; Glivenko-Cantelli; 
length-biased sampling; superpopulation 

1. Introduction 

Consider informative selection of a sample from a finite population, with responses Y 
realized as independent and identically distributed (i.i.d.) random variables with proba- 
bility density function (p.d.f.) /, referred to as the superpopulation model. (Regression 
problems, in which observations are conditionally independent given covariates, are also 
of interest, but the following discussion readily generalizes to that setting and we restrict 
attention to the i.i.d. case for simplicity of exposition.) In non-informative selection (e.g., 
Cassel et al. [6], Section 1.4, or Sarndal et al. [34], Remark 2.4.4), the probability of 



This is an electronic reprint of the original article published by the ISI/BS in Bernoulli, 
2012, Vol. 18, No. 4, 1361-1385. This reprint differs from the original in pagination and 
typographic detail. 



1350-7265 © 2012 ISI/BS 



2 



D. Bonnery, F.J. Breidt and F. Coquet 



drawing the sample does not depend explicitly on the responses Y. We consider informa- 
tive selection in the sense that the sample responses, given that they were selected, are 
not i.i.d. /. A specification of informative selection that includes the i.i.d. case described 
here is given in Pfcffcrmann and Sverchkov [30], Remark 1.2. We study the implications 
of this informative selection for estimation of the superpopulation model. 

In general, the informative selection mechanism may induce dependence among the 
selected observations. Nevertheless, a large body of current methodological literature 
treats the observations as if they were independently distributed according to the sample 
p.d.f., defined as the conditional distribution of the random variable Y, given that it 
was selected. Under informative selection, the sample p.d.f. differs from /. In particular, 
Pfcffcrmann et al. [26] (see some motivating work in Skinner [37]) have developed a sample 
likelihood approach to estimation and inference for the superpopulation model, which 
maximizes the criterion function formed by taking the product of the sample p.d.f. 's, as 
if the responses were i.i.d. This methodology has been extended in a number of directions 
Eideh and Nathan [11-13], Pfeffermann et al. [27], Pfcffermann and Sverchkov [28, 29, 
31]. An extensive review of these and other approaches to inference under informative 
selection is given by Pfcffcrmann and Sverchkov [30]. 

Under a strong set of assumptions (in particular, sample size remains fixed as popula- 
tion size goes to infinity), Pfcffermann et al. [26] have established the pointwisc conver- 
gence of the joint distribution of the responses to the product of the sample p.d.f. 's. This 
is taken as partial justification of the sample likelihood approach. Alternatively, the full 
likelihood of the data (selection indicators for the finite population and response variables 
and inclusion probabilities for the sample) can be maximized (Brcckling et al. [3], Cham- 
bers et al. [7]), or the pseudo-likelihood can be obtained by plugging in Horvitz-Thompson 
estimators for unknown quantities in the log-likelihood for the entire finite population 
(e.g., Binder [2], Chambers and Skinner [8], Kish and Frankel [19], Section 2.4). Obvi- 
ously, each of these likelihood-based approaches requires a model specification. 

Rather than starting at the point of likelihood-based inferential methods for the super- 
population model, we take a step back and consider the problem of identifying a suitable 
model using observed data. In an ordinary inference problem with i.i.d. observations, we 
often begin not by constructing a likelihood and conducting inference, but by using basic 
sample statistics to help identify a suitable model. In particular, under i.i.d. sampling 
the empirical cumulative distribution function (c.d.f.) converges uniformly almost surely 
to the population c.d.f., by the Glivenko-Cantclli theorem (e.g., van der Vaart [39], The- 
orem 19.1). What is the behavior of the empirical c.d.f. under informative selection from 
a finite population? In this paper, we develop an asymptotic framework and weak con- 
ditions on the informative selection mechanism under which the (unweighted) empirical 
c.d.f. converges uniformly, in L2 and almost surely, to a weighted version of the super- 
population c.d.f. The corresponding quantiles also converge uniformly on compact sets. 
Our almost sure results rely on an embedding argument. Importantly, our construction 
preserves the original response vector for the finite population, not some independent 
replicate. 

The conditions wc propose arc verifiable for specified designs, and involve computing 
conditional versions of first and second-order inclusion probabilities. Motivated by real 
problems in surveys and other observational studies, we give examples of where these 
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conditions hold and where they fail. Where the conditions hold, the convergence results 
we obtain may be useful in making inference about the superpopulation model. For 
example, the results may be used in identifying a suitable parametric family for the 
weighted c.d.f, from which a selection mechanism and a superpopulation p.d.f. may be 
postulated using results in Pfeffermann et al. [26]. 

2. Results 

2.1. Asymptotic framework and assumptions 

In what follows, all random variables are defined on a common probability space 
(ft, sf, P). Let &(R) denote the cr-field of Borel sets. Assume that for k € FJ, Y k : (ft, si, 
P) — > (M.,&(M)) are i.i.d. real random variables with a density / with respect to A, the 
Lebesgue measure. Consider {7V 7 } 7 gisj, an increasing sequence of positive integers repre- 
senting a sequence of population sizes, with lim-^oo iV 7 = oo. 

We consider a sequence of finite populations and samples. The 7th finite population is 
the set of elements indexed by U 1 = (1, ... , iV 7 ). In the sampling literature (e.g., Sarndal 
et al. [34]), f/ 7 is often an unordered set, but it is convenient for us to order it and to 
write, for example, ~J2 k£U = Y^k=i' vec t° r of responses for the population is 3^ 7 = 
(Yk)kt£u~, and the sample is indexed by the random vector Xy = (Ijk)keU 1 where the fcth 
coordinate 7 7 fe indicates the number of times element k is selected: or 1 under without- 
replacement sampling, or a non-negative integer under with-replacement sampling. Define 
the distribution of Z 7 conditional on y y : 

g y (il ...,iN y ,VU- ■ -,yN y ) =P(£y = (*!:■ ••>«iV 7 )|3 ; 7 = (yi,---,VNy))- 

We assume that the index of the element k of the population plays no role in the way 
elements are selected. Specifically, let a denote a permutation of a vector of length N 1 . 
Then, for all 7 6 N, (I 7 |y 7 ) and (a -Xy\a ■ y 7 ) are identically distributed, or cquivalently 



We refer to (1) as the exchangeability assumption. It corresponds to the condition of 
weakly exchangeable arrays (Eagleson and Weber [10]) applied to (I-yk, Yk)-yeN,keu y ■ 

Definition 1. For 7 G N. the empirical c.d.f. is the random process F 7 : R — > [0, 1] via 



Definition 2. Given 7, let k,£ G C/ 7 with k ^ £. Assume exchangeability as in (1) and let 

m i(v) = W-rk\Yk = y], 
Vy{y) = Var(J 7fe |y fe = y), 



g 7 (ii, . . . , iat 7 , yi, . . . ,2/jvJ = 9j(o- ■ (ii, ■ ■ ■ ,«at t ),ct • ( Vlj . . . , y N J). 



(1) 



F 7 (a) 



1x^=0 + Efcet/, T ik 
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mi / (yi,y 2 ) = E[I lk \Y k = yx,Y e = y 2 ], 
c-y(yi,y 2 ) = Cov(7 7 fe,/ 7 ^|Ffc =yi,Y t = y 2 ). 

(These definitions do not depend on the choice ofk,£ under the exchangeability assump- 
tion). 

The following conditions on to 7 are used in defining the limit c.d.f.: 

AO. There exist M:R— 5>IR + and m:R— >R + , both X-measurable, such that 

' V7 e N, m 7 < M, 

M/dA<co, (0a) 

m 1 — > m pointwise as 7 — > 00, 

m/dA>0. (0b) 

Definition 3. Under AO, t/ie Zimii c.d./. F s :R->- [0,1] is 

Jl(-oo, Q ]w/dA 



JmfdX 



Remark (relation to sample p.d.f.). Because of informative selection, the empiri- 
cal c.d.f. does not converge to the superpopulation c.d.f. Under some conditions to be 
specified below, it converges to F s , a weighted integral of the superpopulation p.d.f. To 
see this, consider the case of without-replacement sampling and a single element, k. The 
sample p.d.f. defined in Krieger and Pfeffermann [20] is the conditional density of Y k 
given Ijk = 1. By Bayes' rule, 



m 7 (y) 
J m 7 j dA 



/P(/ 7fc = l|n = y)/dA 
f(y) = w 1 (y)f(y). 



Define w = lim 7 _ >00 w 7 and consider a € R. Then 

lim / l(_ OOQ , ] / ST dA= lim ±,_ x iw y fd\= tr_ ooa] wfd\ = F s (a). 

7 — >oa J 7 — >oc J J 

Thus, if observations were i.i.d. from the sample p.d.f., F s would be the natural limiting 
c.d.f. A related argument can be used to show that the same weighted c.d.f. is obtained 
under with-replacement sampling and a fixed number of draws, when considering the 
distribution of any observation in the sample. 
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Because informative selection from a finite population may induce dependence among 
the selected observations, observations are not i.i.d., and we next specify asymptotic weak 
dependence conditions among Z 7 coordinates. 

For a sequence {6 7 }, let o 7 (6 7 ) denote lim 7 _ >00 o 7 (6 7 )6~ 1 = 0. In the next two as- 
sumptions, we define sufficient conditions for uniform L 2 convergence and uniform a.s. 
convergence of the empirical c.d.f. 

Al (Uniform L 2 convergence conditions). 

c 7 (2/i,J/2)/(yi)/(j/2)dyidy 2 =o 7 (l), (la) 

("i 7 (yi,2/ 2 )m 7 (y 2 ,2/i) - m 7 (j/i)m T (y 2 ))/(j/i)/(j/2) dj/i dj/ 2 =o 7 (l), (lb) 

(u 7 +m 7 )/dA = o 7 (7V 7 ), (lc) 
P(Z r = (0,...,0))=a r (l). (Id) 
A 2 (Uniform almost sure convergence conditions). Let y G K N satisfy 

EfceC/, 1 (-oo,a'](yfc) 



sup 

q'GH 



iV 7 



l(-oo,a'l/dA 



:0 7 (1). 



Then for all a£l, 

Varf ^ M-oo ta] (yk)I, k \y-y = (Vi,...,yN 7 )) =o 7 (iV^), (2a) 

1 (-oo,a](yfe)(E[/ 7fc |3 ; 7 = (yi,...,y Ny )} -m 1 (y k )) = o 7 (A 7 ), (2b) 

5 7 ((0,...,0),y) = o 7 (l). (2c) 

Properties of sampling without replacement 

In the case of sampling without replacement, L 1 : fl — > {0, l}^ 7 , AO and Al can be re- 
placed by a simpler set of sufficient conditions for uniform L 2 convergence. 

A 3 (Uniform L 2 convergence conditions under sampling without replace- 
ment). 

TO 7 — > m pointwise as 7 — > 00, 

3to:M— » K + \-measurable s.t. If . , , „ (3a) 

' m/dA > 0, v ' 
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Vj/1,2/2, c 7 (yi,y 2 ) = o 7 (l), 
Vt/1,2/2, m' 7 (y 1 ,y 2 ) - m 7 (y 2 ) = o 7 (l), 
P(X y = (0,...,0)) = o 7 (l). 

These conditions imply AO cmd Al. 



(3b) 
(3c) 
(3d) 



Proof. Since I 7 k £ {0,1}, (0a) and (lc) always hold. By applying the Lebesgue domi- 
nated convergence theorem, we obtain that (la) is verified when Vj/i,2/2,c 7 (2/1,2/2) =07(1) 
and (lb) is verified when Vj/1,2/2, "^(2/1,2/2) -"17(2/2) =o 7 (l). □ 

An important special case of sampling without replacement is non-informative selec- 
tion, with I 7 independent of y~f for all 7 € N. In this case, the sample obtained is an i.i.d. 
sample of size n 1 ~J2keu (Full 01 [14], Theorem 1.3.1), and the classic Glivcnko- 
Cantclli theorem can be applied as soon as n, 7 -> a s 00 as 7 -> 00. The assumptions of 
Theorem 1 and Theorem 2 will then just ensure that the expectation of the sample size 
will grow to infinity, and that its variations are small enough to avoid very small samples. 
We can thus replace A0-A2 by a simpler set of sufficient conditions. 

A 4 (Uniform L2 and a.s. convergence conditions under independent sampling 
without replacement). 



{Nj 1 E[n 7 ] — > m ^ as 7—^00, 
Var(n 7 )=o 7 (A 7 2 ). 



(4) 



These conditions imply A0-A2. 

Proof. We first show that A4 implies A3. Because I 1 and y^ are independent, the 
exchangeability assumption implies m 7 (y) = E[/ 7 i] = A~ 1 E[ri 7 ] and A 7 _1 E[n 7 ] — > m 
by A4, so (3a) holds. Exchangeability also implies 



E[J 7 i J 7 2 



J2kj£U~, : k^l E[/ 7 fe/ 7 £] 

A 7 (A 7 - 1) 



= E 



Sfe,£eC/-, : k^i f^klry* 



A 7 (A 7 - 1) 



= E 



n 7 (n 7 



so 



E 



n 7 (n 7 — A 7 ) 



A2(A 7 -1) 



Var 



C 7 (j/l,J/2) = Cov(I 7 i ! I 7 2) 

by A4, so (3b) is obtained, and (3c) holds by independence. Finally 

P(n 7 = 0) = P(n 7 < 1) = P(n 7 - E[n 7 ] < 1 - E[n 7 ]) 

Var(n 7 ) 



A 7 (A 7 - 1) 

= 0,(1) 



(5) 



(0) 



< P(|n 7 -E[n 7 ]| >E[n 7 ] - 1) < 



(E[n 7 ] - l)^ 



o 7 (l), 



establishing (3d). 
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We next show that (4) implies A2. For all q€K, 

Varl ^2 1 ( 

= 1 (-oo,a](yfe)Var(7 7fe ) 

— oc.a 

< iV 7 + iV 7 (Ar 7 - l)o 7 (l) = o 7 (JV*) 
by equation (5), so (2a) holds. By independence, 

E[/ 7/£ |y 7 = (yi,---,VN y )] =E[I-y k \Y k =y k ] =m 1 {y k ), 
so (2b) holds. Finally, 

<? 7 ((0,...,0),2/)=P(n 7 =0)=o 7 (l) 
by independence and (6), so (2c) holds. □ 

Remark. In conventional finite population asynrptotics (Brcidt and Opsomer [4, 5], Isaki 
and Fuller [17], Robinson and Sarndal [33]), conditions on design covariances Cov(/ 7 fc, I 7 ^ ) 
are imposed to guarantee that the Horvitz-Thompson estimator J2keu Vkl-yki^il-yk]) -1 
is consistent. Typically, these conditions imply that the variance of the Horvitz- 
Thompson estimator is 7 (iV 7 /(iV 7 7r* 7 )), where N^ir^ — > oo is a sequence of lower 
bounds on the expected sample size, E[n 7 ]. These same conditions can be used to show 
that Var(n 7 ) = 7 (N^/(N 7 ir^)) = o 7 (N^), agreeing with A4. 

2.2. Uniform convergence of the empirical c.d.f. 

In this section, we state the main results of the paper: uniform Li convergence of the 
empirical c.d.f. and uniform almost sure convergence of the empirical c.d.f. Important 
corollaries yield uniform convergence of sample quantiles on compact sets. Proofs are 
given in the Appendix. 

2.2.1. Uniform L2 convergence of the empirical c.d.f. 

Theorem 1. Under AO and Al, the empirical c.d.f. converges uniformly in L2 in the 
sense that 

sup \F 7 (a) - F s (a)\ = ||F 7 -F s || 0O H 0. 

a£l 7^oo 
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Definition 4- The limit quantiles £ s : (0, 1) — > R are given by 



Up) 



M{yeR:F s (y)>p} 



and the empirical quantiles £ 7 : (0, 1) — > K are given by 



inf{j,6R: F 7 (y)>p}. 



With this definition, we have the following corollary. 

Corollary 1. Suppose that F s is continuous on R and < F s (yi) = F s (y2) < 1 => yi = 
7/2- Then, under AO and Al, i/ie empirical quantiles converge uniformly in probability to 
the limit quantiles, 



for all K a compact subset of (0,1). Under the further hypothesis that f has compact 
support, the convergence is uniform in Li: 



2.2.2. Uniform almost sure convergence of the empirical c.d.f. 

The Glivenko-Cantelli theorem gives uniform almost sure convergence of the empirical 
c.d.f. under i.i.d. sampling. We now consider uniform almost sure convergence under 
dependent sampling satisfying the second-order conditions of A2. 

Asymptotic arguments in survey sampling consist first in embedding a specific sample 
scheme in a sequence of sample schemes. In the proof of the following representation 
theorem, we link the elements of the sequence of sample schemes in a way that ensures 
uniform almost sure convergence of the empirical c.d.f. We stress that in our result 
the vector of responses for the population remains the original = (Yk)k£U-,i an d n °t 
another set of identically distributed random variables. 

Theorem 2. Under AO and A2, there exist sequences of random variables (/ 7fc ) 7g N,fcGC/ 7 , 
(Y"^)fc e N defined on the probability space (fl x [0, 1], si ® £$[o,i], P' = P <8 -^[0,1]) such that 

• ||F 7 — Fslloc converges P'-a.s. to 

• V7 € N, (X 7 ,[V 7 ) and (2 7 ,^ 7 ) have the same law 

• V7€N 1 wen,a€[0,l] l y j (oj,x)=y^(u)), 

where &[o,i\ is the a -field of B or el sets, A[o,i] is the Lebesgue measure on [0,1], I' = 



sup|£ 7 (p)-£ s (p)| 




su P |£»-Up)I ^ 0. 



p£if 7— >oo 




(7) 
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Corollary 2. Suppose that F s is continuous and < F s (yi) = F s {y2) < 1 y\ = y%. 
If AO and A2 hold, then for (T^^gN.fcet/-, an d {Y^keN that satisfy the conditions of 
Theorem 2, the empirical quantiles 

£»=inf{yeR: F^^p} 

converge uniformly almost surely, 

sup|£ T (p)-£ s (p)l a 4' 
P eK 

for all K a compact subset of (0, 1). 

3. Examples 

We now consider a series of examples of selection mechanisms, motivated by real problems 
in surveys and other observational studies. We give examples where conditions AO, Al, 
A2 hold and where they fail. 

3.1. Non-informative selection without replacement 

• For any sequence of fixed-size without-replacement designs with X 1 independent of 
y y (e.g., simple random sampling, stratified sampling with stratification variables 
independent of y i , rejective sampling (Hajek [15]) with inclusion probabilities inde- 
pendent of y i: etc.), the condition A4 holds provided that n~ l N~ 1 converges to a 
strictly positive sampling rate. 

• For a sequence of Bernoulli samples with parameter p £ (0, 1), the {I-yk} are i.i.d. 
Bcrnoulli(p) random variables, so Var(n 7 ) = N y p(l — p) and condition A4 holds. 

• Poisson sampling corresponds to a design in which, given a random vector 
(II 7 i, . . . , II 7 Ar T ) : Q — » (0, 1]^ , the {/ 7 fc} are a sequence of independent Bernoulh(II 7 fc) 
random variables (Poisson [32]). In this case, the variance of rt 7 is given by 

Var(n 7 )= ^ E[U jk (l - ILy k )] + Var f ^ n 7fc J . 

Note that the first term in this expression is always o 7 (A r 7 ), so it suffices to consider 
the second. 

- In the case where the vector [ILyfcJfcgir is just a random permutation of a non- 
random vector [7r 7fc ] fce (7 7 , then VarQ^ fegt/ n 7fc ) = Var(^ feeC / n~fk) = and A4 is 
satisfied when iV 7 _1 Ylkeu n ik converges to a non-zero constant. 

- Suppose that Z 7 is a random positive real vector of size -/V 7 , and suppose that 
the law of (Z^,y 7 ) is invariant under any permutation of the coordinates. For n* 
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fixed, consider the design in which II 7 fc = n*Z 7 k{J2keu ^ik} ■ Then 



Varf n 7fe) =Var(n;)=0 



and A4 is satisfied when Z 1 and y i are independent and A^ -1 ?!* converges to a 

non-zero constant. 

Let a 7 , b 1 £ (0, 1] with a 7 ^ & 7 . If 

f (a 7 ,...,a 7 ), with probability 1/2, 
(ll 7 i, . . .,H 7 jv t J -| (&7) ... )M) with probability i/ 2) 

then 

Varf £ n 7fe ) ^^I^^JY*). 



Then A4 is not verified and in fact if N 1 a~ j = o 7 (l) we do not have uniform 
convergence of the empirical c.d.f. 



3.2. Length-biased sampling 

Length-biased sampling, in which P(/ 7 fc = l|Yfc = yk) = m-yiyk) oc Uk, is pervasive in real 
surveys and observational studies. Cox [9] gives a now-classic example of sampling fibers 
in textile manufacture, in which m 1 {yk) oc yu = fiber length. In surveys of wildlife abun- 
dance, "visibility bias" means that larger individuals or groups are more noticeable (e.g., 
Patil and Rao [25]), so m 1 {yk) oc yt = size of individual or group. "On-site surveys" 
are sometimes used to study people engaged in some activity like shopping in a mall 
(Nowell and Stanley [24]) or fishing at the seashore (Sullivan et al. [38]); the longer they 
spend doing the activity, the more likely the field staff are to intercept and interview 
them, so m 7 (j/fc) oc yk = activity time. In mark-recapture surveys of wildlife populations, 
individuals that live longer are more likely to be recaptured, so m 1 {yk) oc yk = lifetime 
(e.g., Leigh [22]). Similarly, in epidemiological studies of latent diseases, individuals who 
become symptomatic seek treatment and drop out of eligibility for sampling, while those 
with long latency periods are more likely to be sampled: m~ i {yk) oc yk = latency period. 
Finally, propensity to respond to a survey is often related to a variable of interest; for 
example, higher response rates from higher-income individuals. 

Suppose that / has compact, positive support: J l[ e jv/]/dA = 1 for some < e < M < 
co. For the 7th finite population, consider Poisson sampling with inclusion probability 
proportional to Y, in the sense that {/ 7 fc}fce£/ T are independent binary random variables, 
with 



P(7 7fc = l|Yfe = y k ) = 1 - P(/ 7 fc = 0|Yfe = yk) = m^yk) oc y k - 
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Let r 7 = y k 1 P(/ 7/ t = l\Yk = y k ) be the common proportionality constant (independent 
of k), and assume that r 7 — > r 6 (0, M _1 ] as 7 — >• 00. Then 

mj(y)=T y y -> ry = m(y), 
c 7 (yk,ye) = 0, m'^{y k ,yi) -m 7 (y k ) = 0, 



P(X y = (0,...,0))=E 



1(1- 



fc<E(7 T 



< (l-^e)^ = exp(iV 7 ln(l-T 7 e))=o 7 (l), 
so that A3 is verified. It then follows that the limiting c.d.f. is given by 

y 



F s {a) 



( — 00, a] 



E[Fi 



r/dA. 



(8) 



3.3. Cluster sampling 

Let F denote the superpopulation c.d.f.: F(t) = J l(_ oc T i/dA. Let r £ R be such 
that F(t) >0. Define i l7 = (l(_ o,T]( i fe))fcei7 7 and «2 7 = (l( T)00 )(Ffc))feeC/ 7 - The selec- 
7 = ^17 or Z27 , each with probability 1/2, so uniform convergence of 



tion mechanism is X 
the empirical c.d.f. is not possible. Note that 

Cov(J 7fe , I*yt\Y k = yx,Y e = y 2 ) = |i(- o,T](yi) 1 (-cx),T](j/2) 

+ 5i(r,oo)(yi)l(r,oo)(j/2) - | 

so that 

c 1 {yuy2)f(yi)f(y2) d Vl dy 2 = ±F 2 (r) + i(i - F(r)) 2 - ^ o 7 (l), 

and (la) fails to hold. This example can be regarded as a "worst-case" cluster sample: the 
sample consists of many elements but only one cluster, and the population is made up of 
a small number of large clusters, none of which is fully representative of the population. 



3.4. Cut-off sampling and take-all strata 

In cut-off sampling a part of the population is excluded from sampling, so that I lk = 
with probability one for some subset of C/ 7 . This may be due to physical limitations of 
the sampling apparatus, like a net that lets small animals escape, or may be due to a 
deliberate design decision. For example, a statistical agency may be willing to accept 
the bias inherent in cutting off small y- values if the y-distribution is highly skewed, as is 
often the case in establishment surveys (e.g., Sarndal et al. [34], Section 14.4). 
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Consider cut-off sampling with I lk = for {k € t/ 7 : y k <t}, and simple random sam- 
pling without replacement of size min{n 7 , N-y — X^eir l(_ 00jT ] (j/j)} from the remaining 
population, {j G [/ 7 : 7/j > t}. 

Define Z^. = l(-oo,T](^fe) with corresponding realization z k = l(_ OOT ](?/fe). Let p 7 = 
A^ 1 n 7 and assume that lim 7 _ i . 00 p 7 = p. We now verify A3. 

Define S 7 a = J2jeu :j<£A^j- By the weak law of large numbers, N~ 1 S 1 a^ P F(t) as 
7 — > oo for A = {fc} or ^4 = {/c, £}, and so for those sets A we have 



lim E 

7— >QO 



p 1 - Nj Sja 
1 - N^S jA 



{ Pt >jv t - 1 s t ^} 



(p-F(r))l {p> j- (r)} 
l-F(r) 



by the uniform intcgrability of the integrand. With the same argument, 



lim E 

7^00 



(n 7 - S 1 { k ^)(n 1 - 1 - 5 7 { fc ^}) 

(7V 7 - 5 7{fc , £} )(7V 7 - 1 - S 7{M} ) {^>^m>} 



1 {p>^(^)}- 



Using conditional first and second-order inclusion probabilities under simple random 
sampling, we have 



m 1 (y k ) = Zh + (1 - z fc )E 



n 1 ~ ^7{fe} 



-> z k + (1 - 



7 D 7{fc} 

(p- F(t))± {p>f{t)} 



m'{ye,yk) = z k + (l- z e )(l - z k )E 



l-F(r) 

n 7 — Sjikjy 



N 7 — Sj^fy 



+ zt(\ - z k )E 



n 7 — 1 — 5, 



1 — d-y{k.l} 

: 1 ^-{n^-l>S 7ih , t y}^{Ny-l>S^ iklt} } 

1 — Oryftn 



Zk + (1- z k ) 



7 <-»7{M} 
(p-^( T )) 1 {p>F(r)} 



l-F(r) 

rf 7 (yfc 1 Vl) = E[/ 7fe / 7 £ I Yfe = y k ,Y e = y t ] 

= z k z e + {z k (l - ze) + (1 - Zfe)^}E 



-1-5. 



N~ — l — S. 



j{k,e} 



{n 7 -l>S 7 { fc ,f } } 



+ (l-z fc )(l-^)E 



(n 7 - S 1 { k ^)(n~ 1 - 1 - 5* 7 { fc ,f}) 



l{n 7 >S T{fcj( , } } 



-> z fc z, + (1 - z k ){\ - z e ) 



l{p>F(r)} 



Uniform c.d.f. convergence under informative selection 
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+ {z fc (l - z f ) + (1 - z fc )z4 l-F(r) 

c 7 (yk,ye) = d 7 (y k ,ye) - m\{y k ,yz)m' n {yi>, y k ) = o 7 (l), 
and A3 is verified. 

Cut-off sampling for < r is essentially the complement of stratified sampling with 
a "take-all stratum": Iy k = 1 for the set {k € U 7 : z k = 1}. Take-all strata are common 
in practice, particularly for the highly-skewed populations in which cut-off sampling is 
attractive. Arguments nearly identical to those above can be used to establish A3 in 
the take-all case. This take-all stratified design is analogous to the well-known class of 
case-control studies in epidemiology. We specifically consider prospective case-control 
studies (e.g., Mantel [23], Langholz and Goldstein [21], Arratia et at [1]), in which the 
finite population of all disease cases and controls is stratified, disease cases (z k = 1) are 
selected with probability one, and controls (z k = 0) are selected with probability less 
than one. 



3.5. With-replacement sampling with probability proportional 
to size 



Let {n 7 } be a non-random sequence of positive integers with n 7 < N 7 and suppose that 
/ has strictly positive support: J 1(— oo,o]/d^ = 0. Consider the case of with-replacement 
sampling of n 7 draws, with probability of selecting element k on the hth draw equal p jk G 
[0, 1], J2 keU Pj k = 1. While p lk could be constructed in many ways, a case of particular 
interest is p jk oc Y k . This design is usually not feasible in practice, but statistical agencies 
often attempt to draw samples with probability proportional to a size measure (p.p.s.) 
that is highly correlated with Y. Such a design will be highly efficient for estimation of 
the Y-total (indeed, a fixed-size p.p.s. design with probabilities proportional to Y k would 
exactly reproduce the Y- total). 



For h = l... 



let Rjh be i.i.d. random variables with 



v{R lh = k\y 1 ) = 



n 



E 



Then Ij k = Ylh=i ^{R h=k} counts the number of draws for which element k is selected. 

~ ie U y :jtA Y 3- Then 



Define VY 



N: 



m'^y^yi) = T7~yk E 



1 



n 7 'yk + w l{k} 
1 



^7 [N 7 \y k +ye) + W l{k ^ 
1 



v y (Vk) = ( jj-Vk ) Var 



N^y k + W l{k} ) N^N, 



w. 



7{fc} 



(N^y k + W l{k} r 
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c~/{yk,ye) 



+ n 7 Var 



1 



{N^{y k + yi) + W l{kA Y 
1 



Under mild additional conditions, Al and A2 can be established using straight- 
forward bounding and limiting arguments. A sufficient set of conditions for either 
Al or A2 is n 1 N7' 1 ->r£[0,l] as 7 — > 00 and EfV] 6 ] < 00. Under these conditions, 
m -y(y) = r y(E[^i]) _1 +°7(1)j an d the limiting c.d.f. is the same as in length-biased sam- 
pling, as given by equation (8). 



3.6. Endogenous stratification 

Endogenous stratification, in which the sample is effectively stratified on the value of 
the dependent variable, is common in the health and social sciences (e.g., Hausman and 
Wise [16], Jewell [18], Shaw [36]). Often, it arises by design when a screening sample 
is selected, the dependent variable is observed, and then covariates are measured for a 
sub-sample that is stratified on the dependent variable: for example, undersampling the 
high-income stratum (Hausman and Wise [16]). It can also arise through uncontrolled 
selection effects, in much the same way as length-biased sampling. One such example 
comes from fisheries surveys, in which a field interviewer is stationed at a dock for a fixed 
length of time, and intercepts recreational fishing boats as they return to the dock. The 
interviewer tends to select high-catch boats and, while busy measuring the fish caught on 
those boats, misses more of the low-catch boats. Thus, sampling effort is endogenously 
stratified on catch (Sullivan et al. [38]). 

We consider a sample endogenously stratified on the order statistics of Y. Let {H^} be 
a non-random sequence of positive integers, which may remain bounded or go to infinity. 
For each 7, let {N 7 h}hZi be a set of non-random positive integers with J2h=i Nyh = ^y> 
and let {n^h}hZi be a set of non-random positive integers with n-yh < ^jh- Let 

y {1) < y (2) < • • • < y {Ni) 

denote the order statistics for the 7th population, which is stratified by taking the first 
iV 7 i values as stratum 1, the next iV 7 2 as stratum 2, etc., with the last N 7 h^ values 
constituting stratum iJ 7 . The 7th sample is then a stratified simple random sample 
without replacement of size n-yh from the N~,h elements in stratum h. 

Define Myo = and Mjh = Y]„—i N^h- Because Hy, N y and n 7 are not random, we 
then have 

= E T^P^M-,,^) <Y k < Y [Mih) \Y k = y) 

h=l >h 



Uniform c.d.f. convergence under informative selection 
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where .FV-i(-) is the empirical cumulative distribution function for {Yj}j^u -.j=£k- From 
the classical Glivenko-Cantelli theorem, Fn — i(y) converges uniformly almost surely 
to F. Similar computations can be used to derive 771^(2/1,2/2) and c 7 (j/i, 2/2) and their 
limits. With such derivations, it is possible to establish the following result, the proof of 
which is omitted. 

Result 1. If G{a) = lim^oo Y.h=i n jhN~^t^ N -i M ^ h _ uN -i M ^(a) exists except for a 
finite number of points and is a piecewise continuous nonnull function, and the conver- 
gence is uniform in a then A3 and A2 hold. 

4. Conclusion 

We have given assumptions on the selection mechanism and the supcrpopulation model 
under which the unweighted empirical c.d.f. converges uniformly to a weighted version of 
the superpopulation c.d.f. Because the conditions we specify on the informative selection 
mechanism are closely tied to first and second-order inclusion probabilities in a stan- 
dard design-based survey sampling setting, the conditions are verifiable. Our examples 
illustrate the computations for selection mechanisms encountered in real surveys and 
observational studies. We expect these conditions to be useful in studying the properties 
of other basic sample statistics under informative selection, which will be the subject of 
further research. 

Appendix A: Proofs of Theorems 1 and 2 

The first subsection contains the proof of Theorem 1. The proof consists in showing the 
uniform L2 convergence of the empirical c.d.f., seen as a ratio of two random variables. 
First, we show that from Al we can deduce the L2 convergence of both the numerator 
and denominator, then the classical proof of Glivenko-Cantelli is adapted to obtain a 
uniform Li convergence. 

The second subsection contains the proof of Theorem 2. We first construct two se- 
quences of random variables (2" 7 ) and Y' such that V7, (Z 7 , 3^) and (I 7 ,y 7 ) have the 
same distribution. We then prove uniform L2 convergence of the empirical c.d.f. defined 
from (X 7 ) and Y', almost surely in Y'. The result is "design-based" in the sense that it 
is conditional on Y' , and is of independent interest. We conclude by showing the almost 
sure convergence. 

A.l. Proof of Theorem 1: Uniform L 2 convergence of the 
empirical c.d.f. 

Lemma 1. Given a bounded measurable function t>:R— >WL, AO and Al imply that 
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Proof. Assume AO and Al. The exchangeability property (1) implies 



E 



N - 



bm^fdX — > / bmf d A 



A 7 ,/ "" 7J "j^oc 



by (Oa), (Ob) and the dominated convergence theorem. Further, (1) implies 



Var 



N, J 



" 7 

= ]4f E {Cov(6(y fc )E[v|y fe ,y,],fe(y,)E[/ 7i |F fc ,r,]) 

+ E[b(Y k )b(Y e ) Cov(/ 7fc , J 7 ,|r fc , 
= fy b (yi)Ky2)m' 1 (y 1 ,y 2 )m' 1 (y2,yi)f(yi)f(y2)dy 1 dy2 

\ 2 

Kvi ) m 7 (yi j 2/2 ) / (yi ) / (2/2 ) dyi dy 2 



Hvi ) %2 )c 7 (yi , 2/2 ) / (2/1 ) / (2/2 ) dyi dy 2 



2 



— yj &XfdA+ / b'm^fdX-^l bm^fdX 



1 



iV 7 



b{yi)Ky2){m' 7 (yi,y 2 )m' 7 (y2, yi) 

- m 7 (yi)m 7 (y 2 ))/(yi)/(y 2 )dyi dy 2 
Hyi) b (y2)c 1 {yi 7 y 2 )f{yi)f{y2) dyi dy 2 

6 2 (w 7 +m 7 )/dA- ( / 6rn 7 /dA % 



= o 7 (l) 

by (la), (lb), and (lc), and the result is proved. □ 

Lemma 2. Under AO and Al, the numerator of the empirical c.d.f. converges uniformly 
in L2: 
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Proof. We first define G 7 : R -> M+ and G s : K ->■ M+ as 

G 7 (a) = -^r- l(-oo,a](yfe)/ 7 fc and G s (a) = J l(_ 0Ojet ]m/dA. 



7 fcet/ 7 



With these definitions, 

Efeec/ 7 1 (-oo,a]0'fc)-£yfc 



sup 

q61 



AT 7 



l(-oo,a]W 7 /dA 



||G 7 — G s ||, 



Let 77 6 N* index the positive integers and define a sequence of subdivisions {af^glgio 
via a^o = —00, QJjj.jj+i = 00, and for q = 1, ... ,77, 



a„,, = inf{a e R|G a (a) > »? _1 gG,(oo)}. 



We first show that for all positive integers 77, 

G (00) 

sup{|G 7 (a) - G s (a)\} < max {|G 7 (a,, g ) - G s (o! ?)jg )|} + ■ 



0<g<r/+l 77 

Let 77 £ N and a 6 K. Then a 6 [a^ ig , a^^+i] for some < g < 77, and 

G 7 (a^,g) < G 7 (a) < G 7 (a^. 9+ i), 
^(ofy,?) < G s (a) < G s (a I( ^ + i), 

^ / \ G«(oo) . , . G a (oo) 

Gg(a„,g + iJ < G s {a) < & s (a„,g) H . 

77 77 

Combining these inequalities, we have 

G 7 (a I) . 9 ) - G s (a v , q ) - ^ b ^°^ < G 7 (a) - G s (a) 

< G 7 (a, M +i) — G s (a 7hq+ i) H ^ — -, 

so that 

\G 7 (a)-G s (a)\ 

< max{|G 7 (a T7 ,g) - G s (a, hq )\, \G 1 {a riA+ i) - G s (a Viq+1 )\} H — — 

G (00) 

- nJ 1 }^ , ^ I G ~< (°W ) - G * (°W ) I } + — ■ 

0<q'<r;+l 77 

Thus, for all a € R, 



(a) - G»| 2 < 2f max {|G 7 K, ? - G a (a,^)| a } + , 
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so that 



E[||G 7 -G S ||L]<2E 



max UG^ia^g) - G s (a Viq )\ } 

0<g<?)+l 



2G s (oo) s 
if 1 



(9) 



Let e > be given. Choose )) G N so large that 2G s (oo) 2 rj 2 < e/2, then use Lemma 1 
to choose r so that 7 > L implies 



2E 



max {|G 7 (a, M ) - G s (a, ):g )| 2 } 

0<q<?7+l 



e 

<2" 



Hence, for all 7 > T, the right-hand side of (9) is bounded by e, which was arbitrary, so 



lim w E[(||G 7 -G s || oo ) 2 ]=0. 

Proof of Theorem 1. By Definitions 1 and 3 and AO, for all a S 



□ 



G 7 (a) 



G 7 (oo) + 1g 7 (oo)=o' 



F s (a) 



G s (a) 
G s (oo)' 



so 



\\Fy-F s 



Gr, 



G,. 



G 7 (00) +1g 7 (oo)=o G s (oo) 



Gry Gq 



g 



G s (oo) - (G 7 (oo) + 1g 7 (oo)=o) 



< 



< 



< 



' G s (oo)(G 7 (oo) + l G7 

|G 7 ||oo |G 7 (oo) + 1g t (oo)=o - G s (oo)| 



G s (oo) 
||G 7 — G s 

G s (oo) ^ G 7 (oo) + 1g 7 (oo)=o G s (oo) 
||G 7 -G s || oc |G 7 ( TO ) + 1g t(o o)=o-G s ( TO )| 



G s (oo) G s (oo) 
||G 7 -G S || 00 |G s (oo)-G 7 (oo)| 1g t (oo)=o 



G s (oo) 



G s (oo) 



G s (oo) 



From Lemma 2, the first two summands converge to in L2. From (Id), so does the 
third summand. □ 



A. 2. Proof of Theorem 2: Uniform almost sure convergence of 
the empirical c.d.f. 

Construction ofX'^, Y' 

We define Y' and 2" 7 on the probability space (f2 x [0,1], £/ ®3§[o.i], P' = P®A[ 0j i]). First, 
define Y' :Q x [0, 1] R N via 

Y'(cj,x) =Y{u). 
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Let y' be the vector of random variables (Y{, . . . , Y^ ) and note that y' (uj, x) = y 7 (ui). 
Let S jy = N N ^: g 7 (i,y) 0} and note that for a given y G R N -> , J2i^s 9~t(h V) = ^- 



Define : R N ~< x N N ~> M via 



h~,(y,i) = sup 



Efc6l/,*fc 1 (-oo,a](l/fc) _ , . 

G s {a) 



li=o + Efcec/,(*fe) 



We now impose an order on the M 1V vectors in S ly by requiring h-y to be non-increasing; 
that is, for vectors £ S^, t < u if and only if h 7 (y : i^) > h 7 (y,i( u >). Any tics 

can be resolved, for example, by randomization. For ui € f2 and a; €E [0, 1], we then define 
l^(oj,0) = i^ and for a; > 



M. 



1' 7 (UJ,X) = Xl i( " )l (E t <„9-y(i W ,J'-y(^)).E t <„S-r(i W .^(^))]( :E )- 
u=l 

Because we use uniform measure on <^[o,i], the vector is sampled from S^y ( w ) with 
probability g 7 (i^ u \ y y (uj)). Thus, by construction we have for all 7, 

P'[2^ = i|3^ = y] = g^(i,y) = P[X 7 = i\y^ = y] 

and P'[y; = ? y] = P[^ 7 = y], so that 

p'[i; = *, 3/ = y] = P[Z 7 = i,y^ =y]. 

This yields the following property. 
Property 1. For all 7, 

h y (y'l') = sup \F'Ja) - F s {a)\ = \\F' - F^ 

has the same law as \\F 7 — i^Hoo, where F' is defined in (7). 



Define G 7 



G 7 (a) = 



noting that F^ = G 7 (G 7 (00) + lc (oo)=o) We then have the following lemma. 
Lemma 3. Under AO and A2, for all a£K, 



im / (G' (a)(ui,x) - G s (a)f d\{x) = P-a.s. (w). 



lim 

7 
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f^GC = < w G fi: lim sup 

7^00 Q£K 



N' 1 %-oo,a](*fc)(w)- J l(_oo,a]/dA 



= 



From the Glivenko-Cantelli theorem, P(J7gc) = 1- We will show that for all u> £ f^GC: 



(G' (a)(oj,x) ~G s (a)fd\(x) =o 7 (l). 



[0.1] 

Let w £ ficc • We then have 



[0,1] 



(G;(a)(w,x) -G s (a)) 2 dA(x) 



< 



G^(a){uj,x) — — l - — ) dX(x) 



[0,1] 



AA 



E/cet/-, 1 (-oo, Q ](yfc(w))/ [01] ^ fc (a;,w)dA(w) E fc ec/ 7 %-oo,a](^fcM)m 7 (y fc (u;)) 



— 00, qJ 



l(-oo,Q]"T- 7 / d ^ 



The first term is the square root of 

Var(G' (a) I y' = (^(w), . . ., iWw))) = AT- 2 o 7 (A^) - o 7 (l) 



• 7 V~7I" 7 

by (2a). The second term is 



£ 1( -°°^ (rfcH) (E 



feet/. 



o 7 (l) 



by (2b). The third term is o 7 (l) because the convergence of the empirical measure given 
by A2 implies the convergence of the integral for all bounded random variables. Finally, 
the fourth term is o 7 (l) by AO and the dominated convergence theorem. □ 

The following lemma has its own interest, yielding design-based uniform L2 conver- 
gence of the empirical c.d.f. 

Lemma 4. Under AO and A2, 

{h 1 {y' 1 {^,x),l l 1 {iu,x))) 2 d\{x)^o 1 {l) P-a.s. («). 
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Proof. Starting from Lemma 3 and adapting the proof of Lemma 2, we have that: 
A2=^ J(||G 7 (y;(w,a;),i;(w,a;)) - G s || 00 ) 2 dA(x) =o 7 (l) P-a.s. (w). We then adapt the 
end of the proof of Theorem 1 and get the result. □ 

Definition 5. For uj G fl, 7 G N and all e > 0, a e . ltUJ G [0, 1] is defined as 



*e,-r,e>- I ^{h^(y'i')(u 1 ,x)>e}d\(x) = \[o A ]({h J (y ,l )(u}, ■)>£}). 
'[0,1] 

Property 2. For all e > 0, 

limsupl^w, im( u , i )> £ } = l{o} P-a.s. (u). 

7— »oo 

Proof. First note that Vx G [0,1], %-{h^(y ,x')(u,x)>e} = &(o,a s T w ] fa)) because by con- 
struction of Hy,y', {x G [0, 1]: /i 7 (3^ 7 ,2l)(w,a;) > e} is a subintcrval of [0, 1] containing 
of measure a en , u . Further, Vx G [0, 1], 

limsupl {M) ,, im (u , i)>£} =l [0ilimsup aei7 uj] (x). (10) 

7— >-oo 

By Lemma 4, the random variable 

/i 7 (y;,i;)( W ,-):([0,l],^ [o ,i],A [Oil] )^K 

converges in L,2(X) to 0, P-a.s. (w), hence it also converges in probability to 0, and so 
lim-^oo a £j7 , w = 0. The result then follows from equation (10). □ 

Proof of Theorem 2. We want to show that 

A0,A2^|| J F 7 - J F s || oo a 4-0 as 7 ^oo, 
which is equivalent to showing that 

AO, A2=^P'({ lim ) /i 7 (y;,j;)=o}) =1. 
Assume AO and A2. We calculate: 

^(te^c>«)=°}) 
=p'(nun^(^,^)<£} 

\>o r 7>r 



limP'iij f|{/ i7 (^;,j;)< £ } 

£ ^ r 7 >r 
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=iimi-p'(nij{ /i 7(^^;)>4 

^ r 7 >r / 
= 1 - lim / limsupl^ ry> x , ){ ^ x)>£} 6P'{lo,x). 



7- 

Let e > 0. Applying Fubini's theorem, 



limsupl^^^/^^^^^e} dP'(uj,x) 

lim supl { ^(^ ! i; ) ( tlJ ^)> 6} dA [0il] (a;) I dP(u). 



Since we have limsup 7 ^. 00 l{h y (y ,z' )(uj,x)>e} = l{o}( ;E ) P-a.s. (w), we also have for all 
e > that 



limsupl^w, i')( WlX )> £ }dA[o,i](a;) = / 1{ }(z) dA [0 .i] (x) = 
7^00 J [0,1] 

P-a.s. (w). Thus, 

P '(fc^^^;)=°})= i - D 

Appendix B: Proof of Corollaries 1, 2 

We state the following lemma which is a consequence of a theorem due to Polya (e.g., 
Scrfling [35], page 18). The proof is omitted. 

Lemma 5. Let {u 7 (-)} 7S n be a sequence of increasing step functions, w 7 : R — > 
[0,1], that converges pointwise to a continuous increasing function u:R — > [0,1] with 
lim y ^_ 00 w(y) = 0, lim^oo u(y) = 1 and < u(yi) = u(y 2 ) < 1 z/i =2/2- Define q 1 {p) = 
inf{y e R: u 7 (j/) >p}, <?(p) = inf{y £ R: u(y) >p}. Then for all K a compact subset of 
(0,1), lim 7 _ >00 sup peiC {g 7 (p) -<?(p)} = 0. 

B.l. Proof of Corollary 1 

Proof. As m~/f and to/ may have different supports, we extend the definition of £ s by 

Vp G R, &(p) = inf{i/ G R: F.fo) > p}. 
Let K be a compact subset of (0, 1). Then 

sup|£ 7 (p)-£ s (p)| ^ 
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if from all subsequences one can extract a subsequence that converges a.s. to 0. Let r: N — > 
N be a strictly increasing function. If ||F 7 - F s ||oo^ i2 then ||F t(7) - F s ||oc^ L2 and 
||F r ( 7 ) — Fslloo— > p 0. Then there exists p: N — > N strictly increasing such that ||^V(p(7)) — 
-F s ||oo-^ a ' s '0 and by Lemma 5, P(lim 7 ^ oc svcp peK |£ T ( p ( 7 ))(p) -£s(p)| = 0) = 1. 

For the uniform L2 convergence, let p £ (0,1) and cieE. Then \F 7 (a) — F s (a)\ < 
II-F7 - ^sllooj so that 

{aeR: F,(a)>p+||F 7 -JJ' a || 00 }c{aGR: F^{a)>p) 

c{aeR: F.(o)>p-||F 7 -if a ||oo}, 

and 

inf{aeR: F s (a) >p+ ||i? 7 - F^} > inf{a 6l: F 7 (a) >p} 

>inf{aeK: F s (a) >p- ||F 7 - F^}. 

Hence, Vp £ (0, 1), £ s (p+ ||F 7 - F^U) > £ 7 (p) > &(p - II-F7 " ^11°°)- 
Further, / has compact support by hypothesis, so there exists b > such that the 
supports of (to 7 /) 7£ n and mf are included in [—b, b]. So Vp £ (0, 1), 7 £ N, —6 < £ 7 (p) < b, 
—b < £ s (p) < b. By combining these three inequalities, we have, Vp E (0, 1): 

l6(p)-e7(p)l< m W^6b+ll^7-^lloo)}-max{-6,6(p-||F 7 -F s || 0O )}. (11) 

Since K C (0, 1) is compact, there exists a £ (0, 1) such that i(c [a, 1 — a]. With the 
assumed continuity of F s , we have that £ s is uniformly continuous on any subinterval of 
[0, 1] that does not contain zero. Thus, for e > 0, there exists 77 £ (0, a/2) such that p £ K 
implies |£ a (p + 77) - £ s (p - r?)| < e. If ||F 7 - F^ < 77, then p + ||F 7 - F s ||oo < p + 77 < 
1 - a/2, and £ s (p+ ||-F 7 - ^ s ||oc) < 6, P - ||F 7 - F s > p - 7? > a/2 and &(p - ||F 7 - 
■Pslloo) > —b, so equation (11) is bounded by e. If ||F 7 — -Falloo > ?7i then (11) is bounded 
by (26) 1 { || Ft _ Fs || =o>i , } . Thus, 

<e 2 +46 2 P(||F 7 -F s || 0O >r7). 
Since £ was arbitrary and P(||-F 7 — F s \\oa > rj) — > as 7 — > 00, the result follows. □ 



E 



sup |£ 7 (p) 



Up)\ 



B.2. Proof of Corollary 2 

Proof. If j|F 7 — -Felloe— > a8 0, then for all K a compact subset of (0, 1), and all (u>,x) £ 
{(uj,x): \\F^ — -Fsljoo 0}, we apply Lemma 5 with u 7 = F^(ui,x), u = F s , and obtain 
that P'(lim 7 ^ oo sup peJr |e;(p)-^(p)|=0) = l. □ 
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